SingleCellAlleleExperiment 0.0.0.9000
The SingleCellAlleleExperiment (SCAE) class is a container for storing and handling allele-aware quantification data for immune genes. The SCAE class is derived
from the SingleCellExperiment(SCE) class and uses the same overall object architecture. However, multiple data layers are integrated into the object during object generation.
Data from a novel allele-aware quantification method contains information about alleles for immune genes. During object generation, the allele information is aggregated into two additional data layers
via an ontology based MHC design principle and appended to the initial raw data using a lookup table. Thus the final SCAE object contains non-immune genes, alleles for immune genes, an immune gene layer and a functional class layer (refer to Figure 1).
For example, the counts of the alleles A*01:01:01:01 and A*02:01:01:01 that are present in the raw input data will be summed up and transformed into the HLA-A immune gene layer. Next, all counts for the present HLA-class I immune genes will be summed up and transformed into the HLA-class I functional class layer. The information necessary to perform these transformations is saved in a lookup table retrieved from the IPD-IMGT/HLA database.
The implemented object follows similar conventions like the SCE class, where rows should represent features (genes, transcripts) and columns should represent cells. Established single cell packages like scater and scran can be used with the SCAE object to perform downstream analysis on immune gene expression.
This allows new insights on functional as well as allele level to uncover the high diversity of immune genes.
Figure 1: Scheme of SingleCellAlleleExperiment object structure with lookup table.
The read in function of the SCAE package readAlleleCounts expects specific files and file identifiers.
The stated input directory should contain the following files:
The used dataset for later downstream analysis and testing of the functionalities of the multi-layer object is a dual-center, two-cohort study where whole blood and peripheral blood mononuclear cells underwent scRNA-sequencing. The whole transcriptome dataset is under controlled access and not for public use. The corresponding publication Severe COVID-19 Is Marked by a Dysregulated Myeloid Cell Compartment can be found here.
The following packages are abundant for performing the downstream analysis and visualization.
library(SingleCellAlleleExperiment)
library(scran)
library(scater)
library(gridExtra)
library(tidyverse)
library(patchwork)
library(cowplot)
library(ggplot2)
library(viridis)
First the directory path is stated. The directory should contain all expected files.
dir_path <- "../inst/extdata"
list.files("../inst/extdata")
## [1] "cells_x_genes.barcodes.txt" "cells_x_genes.genes.txt"
## [3] "cells_x_genes.mtx" "lookup_table_HLA_only.csv"
## [5] "scae_advanced.png"
Starting the workflow with assessing the quality of the underlying dataset. Prior to filtering out low-quality cells, the plotKnee() function of the SCAE package can be used. The function offers two different options for generating a knee plot. A default one (plot B) and an advanced one (plot A) with a computed knee and inflection point on the curve, which pose suggestions for potential threshold for filtering out low-quality cells.
ega_knee_adv <- plotKnee(dir_path, "advanced")
ega_knee_def <- plotKnee(dir_path, "default")
knee <- ega_knee_adv + labs(title = "A") + theme_bw() +
ega_knee_def + labs(title = "B") + theme_bw() +
theme(axis.title.y=element_blank(), legend.position = "none")
knee
In plot A, the advanced knee-plot, the inflection point suggests a threshold for filtering out low-quality cells at about 110 total UMI counts.
This threshold is passed to the filter_threshold parameter of the readAlleleCounts() function.
The exemplary dataset is a whole transcriptome dataset, which is why the exp_type parameter is set to WTA. The alternative would be to state Amplicon. The symbols parameter is for retrieving NCBI gene names for the ensemble gene names present in the raw data with the help of the biomaRt package. This parameter is not abundant to state at function execution as the standard value is biomart. For offline execution, orgdb is suggested, which uses the org.Hs.eg.db package.
SCAE objects implements the same object architecture as SCE objects.
scae <- readAlleleCounts(dir_path,
sample.names = "EGA_WTA",
filter_threshold = 110,
exp_type = "WTA",
symbols = "biomart")
## [1] "Runtime check (1/2) Read_in: 6.24990248680115"
## [1] " Generating SCAE (1/5) extending rowData: 6.8026978969574"
## [1] " Generating SCAE (2/5) filtering and normalization: 0.32849645614624"
## [1] " Generating SCAE (3/5) alleles2genes: 2.53575468063354"
## [1] " Generating SCAE (4/5) genes2functional: 1.63666749000549"
## [1] " Generating SCAE (5/5) log_transform: 1.63666749000549"
## [1] "Runtime check (2/2) Generating SCAE completed: 12.8921468257904"
## [1] "Total runtime, completed read_in, filtering and normalization and generating scae object 19.1424615383148"
scae
## class: SingleCellAlleleExperiment
## dim: 62718 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(62718): ENSG00000160072.20 ENSG00000279928.2 ... HLA_class_I
## HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Two new classification columns are introduced in the rowData slot. Namely the NI_I column and Quant_type column. Both columns are used to identify each row of the object to its corresponding data layer (see figure 1).
rowData(scae)
## DataFrame with 62718 rows and 4 columns
## Ensembl.ID Symbol NI_I Quant_type
## <character> <character> <character> <character>
## ENSG00000160072.20 ENSG00000160072.20 ATAD3B NI G
## ENSG00000279928.2 ENSG00000279928.2 DDX11L17 NI G
## ENSG00000228037.1 ENSG00000228037.1 NA NI G
## ENSG00000142611.17 ENSG00000142611.17 PRDM16 NI G
## ENSG00000284616.1 ENSG00000284616.1 NA NI G
## ... ... ... ... ...
## HLA-DQA1 HLA-DQA1 HLA-DQA1 I G
## HLA-DQB1 HLA-DQB1 HLA-DQB1 I G
## HLA-DPB1 HLA-DPB1 HLA-DPB1 I G
## HLA_class_I HLA_class_I HLA_class_I I F
## HLA_class_II HLA_class_II HLA_class_II I F
As the object extends the count matrix during the object generation, its abundant to compute scaling factors on the raw data prior to extending and integrating the data layers. The scaling factors are used for scaling normalization in a later step of the SCAE constructor.
colData(scae)
## DataFrame with 47788 rows and 3 columns
## Sample Barcode sizeFactor
## <character> <character> <numeric>
## AAATCCTGTAAACGTACCAAAGTCATT EGA_WTA AAATCCTGTAAACGTACCAA.. 0.361248
## AAATCCTGTAAACGTACCACGTGGGAG EGA_WTA AAATCCTGTAAACGTACCAC.. 0.188150
## AAATCCTGTAAACGTACCTCTCACGGA EGA_WTA AAATCCTGTAAACGTACCTC.. 0.457581
## AAATCCTGTAAACGTACCTGCATAGTA EGA_WTA AAATCCTGTAAACGTACCTG.. 0.871510
## AAATCCTGTAAACTGCGCAACAACGCG EGA_WTA AAATCCTGTAAACTGCGCAA.. 1.846879
## ... ... ... ...
## TTGTTCCAATTGGTATGAATCATTCTG EGA_WTA TTGTTCCAATTGGTATGAAT.. 0.931718
## TTGTTCCAATTGGTATGAATGGGACTC EGA_WTA TTGTTCCAATTGGTATGAAT.. 2.170497
## TTGTTCCAATTGGTATGACATAACGTT EGA_WTA TTGTTCCAATTGGTATGACA.. 1.467569
## TTGTTCCAATTGGTATGACATAGGTCA EGA_WTA TTGTTCCAATTGGTATGACA.. 0.704433
## TTGTTCCAATTGGTATGAGCAACATTA EGA_WTA TTGTTCCAATTGGTATGAGC.. 2.126846
Additionally to the established getters from the SCE package, new getters are implemented to retrieve the different data layers integrated in the SCAE object.
get_nigenes(scae)
## class: SingleCellAlleleExperiment
## dim: 62695 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(62695): ENSG00000160072.20 ENSG00000279928.2 ...
## ENSG00000277475.1 ENSG00000275405.1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_nigenes(scae)))
## [1] "ENSG00000160072.20" "ENSG00000279928.2" "ENSG00000228037.1"
## [4] "ENSG00000142611.17" "ENSG00000284616.1" "ENSG00000157911.11"
get_alleles(scae)
## class: SingleCellAlleleExperiment
## dim: 14 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(14): A*02:01:01:01 A*26:01:01:01 ... DPB1*10:01:01:01
## DPB1*13:01:01:01
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_alleles(scae)))
## [1] "A*02:01:01:01" "A*26:01:01:01" "B*51:01:01:01" "B*57:01:01:01"
## [5] "C*04:01:01:01" "C*06:02:01:01"
get_agenes(scae)
## class: SingleCellAlleleExperiment
## dim: 7 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(7): HLA-A HLA-B ... HLA-DQB1 HLA-DPB1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_agenes(scae)))
## [1] "HLA-A" "HLA-B" "HLA-C" "HLA-DRB1" "HLA-DQA1" "HLA-DQB1"
get_func(scae)
## class: SingleCellAlleleExperiment
## dim: 2 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(2): HLA_class_I HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_func(scae)))
## [1] "HLA_class_I" "HLA_class_II"
In case the raw input data contains allele information that is not present in the lookup table, these alleles will be classified accordingly in the Quant_type classification column in the colData slot and can be retrieved with the get_unknown() getter. These alleles have potentially invalid identifiers and will not be considered in further generated data-layers.
get_unknown(scae)
## class: SingleCellAlleleExperiment
## dim: 0 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(0):
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
head(rownames(get_unknown(scae)))
## character(0)
Checking the expression for the allele-layer, immune gene layer and functional class layer. Allele identifiers are in the form of A*02:01:01:01. The immune genes are in the form of HLA-A and the functional classes HLA_class_I. Here we see that HLA-class I and HLA-C are the most abundant functional class and immune gene respectively, given the underlying dataset.
all_alleles <- c(rownames(get_alleles(scae)),
rownames(get_agenes(scae)),
rownames(get_func(scae)))
plotExpression(scae, all_alleles)
In the following sections, main steps for dimensional reduction are performed, offering insights into the different data layers of the SCAE object as well giving an idea on how to perform immune gene expression analysis.
Do we need to do this?????
scae <- scae[rowSums(counts(scae)) > 0, ]
scae
## class: SingleCellAlleleExperiment
## dim: 29923 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29923): ENSG00000160072.20 ENSG00000228037.1 ... HLA_class_I
## HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
The non-imune genes are combined with each of the integrated immune gene allele-aware layers to determine three different subsets.
redim_scae <- scae
ni_a <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_alleles(redim_scae))), ]
ni_a
## class: SingleCellAlleleExperiment
## dim: 29914 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29914): ENSG00000160072.20 ENSG00000228037.1 ...
## DPB1*10:01:01:01 DPB1*13:01:01:01
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
ni_g <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_agenes(redim_scae))), ]
ni_g
## class: SingleCellAlleleExperiment
## dim: 29907 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29907): ENSG00000160072.20 ENSG00000228037.1 ... HLA-DQB1
## HLA-DPB1
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
ni_f <- redim_scae[c(rownames(get_nigenes(redim_scae)), rownames(get_func(redim_scae))), ]
ni_f
## class: SingleCellAlleleExperiment
## dim: 29902 47788
## metadata(0):
## assays(2): counts logcounts
## rownames(29902): ENSG00000160072.20 ENSG00000228037.1 ... HLA_class_I
## HLA_class_II
## rowData names(4): Ensembl.ID Symbol NI_I Quant_type
## colnames(47788): AAATCCTGTAAACGTACCAAAGTCATT
## AAATCCTGTAAACGTACCACGTGGGAG ... TTGTTCCAATTGGTATGACATAGGTCA
## TTGTTCCAATTGGTATGAGCAACATTA
## colData names(3): Sample Barcode sizeFactor
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
Using the modelGeneVar() function prior to getTopHVGs. Both functions are part from the scran package.
df_ni_a <- modelGeneVar(ni_a)
head(df_ni_a)
## DataFrame with 6 rows and 6 columns
## mean total tech bio p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000160072.20 0.002933487 0.003856521 0.003383365 4.73157e-04 1.75665e-02
## ENSG00000228037.1 0.000567862 0.000769625 0.000654949 1.14677e-04 4.17255e-03
## ENSG00000142611.17 0.000240115 0.000336065 0.000276939 5.91253e-05 6.49211e-04
## ENSG00000157911.11 0.001207261 0.001281901 0.001392406 -1.10505e-04 8.84075e-01
## ENSG00000269896.2 0.000138593 0.000201914 0.000159847 4.20670e-05 3.67483e-05
## ENSG00000228463.10 0.001309151 0.001739702 0.001509922 2.29781e-04 1.09352e-02
## FDR
## <numeric>
## ENSG00000160072.20 0.061996901
## ENSG00000228037.1 0.017757513
## ENSG00000142611.17 0.003390449
## ENSG00000157911.11 1.000000000
## ENSG00000269896.2 0.000244906
## ENSG00000228463.10 0.041368136
df_ni_g <- modelGeneVar(ni_g)
head(df_ni_g)
## DataFrame with 6 rows and 6 columns
## mean total tech bio p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000160072.20 0.002933487 0.003856521 0.003375378 4.81143e-04 6.77935e-03
## ENSG00000228037.1 0.000567862 0.000769625 0.000653403 1.16223e-04 1.03295e-03
## ENSG00000142611.17 0.000240115 0.000336065 0.000276286 5.97791e-05 8.93833e-05
## ENSG00000157911.11 0.001207261 0.001281901 0.001389119 -1.07218e-04 9.09350e-01
## ENSG00000269896.2 0.000138593 0.000201914 0.000159470 4.24443e-05 2.01703e-06
## ENSG00000228463.10 0.001309151 0.001739702 0.001506358 2.33345e-04 3.65000e-03
## FDR
## <numeric>
## ENSG00000160072.20 2.39261e-02
## ENSG00000228037.1 4.39626e-03
## ENSG00000142611.17 4.66851e-04
## ENSG00000157911.11 1.00000e+00
## ENSG00000269896.2 1.34456e-05
## ENSG00000228463.10 1.38086e-02
df_ni_f <- modelGeneVar(ni_f)
head(df_ni_f)
## DataFrame with 6 rows and 6 columns
## mean total tech bio p.value
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSG00000160072.20 0.002933487 0.003856521 0.003387065 4.69457e-04 1.10875e-02
## ENSG00000228037.1 0.000567862 0.000769625 0.000655665 1.13960e-04 2.06304e-03
## ENSG00000142611.17 0.000240115 0.000336065 0.000277242 5.88225e-05 2.31380e-04
## ENSG00000157911.11 0.001207261 0.001281901 0.001393929 -1.12028e-04 9.07633e-01
## ENSG00000269896.2 0.000138593 0.000201914 0.000160022 4.18921e-05 7.79048e-06
## ENSG00000228463.10 0.001309151 0.001739702 0.001511573 2.28129e-04 6.37534e-03
## FDR
## <numeric>
## ENSG00000160072.20 3.91194e-02
## ENSG00000228037.1 8.77760e-03
## ENSG00000142611.17 1.20830e-03
## ENSG00000157911.11 1.00000e+00
## ENSG00000269896.2 5.19107e-05
## ENSG00000228463.10 2.41147e-02
Plotting the variance for each data layer.
#plot variance
par(mfrow=c(1,3))
fit_ni_a <- metadata(df_ni_a)
plot(fit_ni_a$mean, fit_ni_a$var,
xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-A")
points(fit_ni_a$mean, fit_ni_a$var, col = "red", pch = 16)
curve(fit_ni_a$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)
fit_ni_g <- metadata(df_ni_g)
plot(fit_ni_g$mean, fit_ni_g$var,
xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-G")
points(fit_ni_g$mean, fit_ni_g$var, col = "red", pch = 16)
curve(fit_ni_g$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)
fit_ni_f <- metadata(df_ni_f)
plot(fit_ni_f$mean, fit_ni_f$var,
xlab = "mean of log-expression", ylab = "variance of log-expression", main = "NI-F")
points(fit_ni_f$mean, fit_ni_f$var, col = "red", pch = 16)
curve(fit_ni_f$trend(x), col = "dodgerblue", add = TRUE, lwd = 2)
par(mfrow=c(1,1))
Compute a list of HVGs for each data layer. Return the top 0.1 % HVGs per layer using getTopHVGs.
top_ni_a <- getTopHVGs(df_ni_a, prop = 0.1)
length(top_ni_a)
## [1] 1726
head(top_ni_a, n = 15)
## [1] "ENSG00000244734.4" "ENSG00000188536.13" "ENSG00000206172.8"
## [4] "ENSG00000143546.10" "ENSG00000147454.14" "ENSG00000115523.17"
## [7] "ENSG00000090382.7" "ENSG00000197746.15" "ENSG00000163220.11"
## [10] "ENSG00000162722.9" "C*04:01:01:01" "C*06:02:01:01"
## [13] "ENSG00000019582.17" "ENSG00000223609.11" "ENSG00000087086.15"
top_ni_g <- getTopHVGs(df_ni_g, prop = 0.1)
length(top_ni_g)
## [1] 1740
head(top_ni_g, n = 15)
## [1] "ENSG00000244734.4" "ENSG00000188536.13" "ENSG00000206172.8"
## [4] "ENSG00000143546.10" "ENSG00000147454.14" "ENSG00000115523.17"
## [7] "ENSG00000090382.7" "ENSG00000197746.15" "ENSG00000163220.11"
## [10] "ENSG00000162722.9" "ENSG00000019582.17" "ENSG00000087086.15"
## [13] "ENSG00000223609.11" "ENSG00000158869.11" "ENSG00000133742.14"
top_ni_f <- getTopHVGs(df_ni_f, prop = 0.1)
length(top_ni_f)
## [1] 1718
head(top_ni_f, n = 15)
## [1] "ENSG00000244734.4" "ENSG00000188536.13" "ENSG00000206172.8"
## [4] "ENSG00000143546.10" "ENSG00000147454.14" "ENSG00000115523.17"
## [7] "ENSG00000090382.7" "ENSG00000197746.15" "ENSG00000163220.11"
## [10] "ENSG00000162722.9" "ENSG00000019582.17" "ENSG00000087086.15"
## [13] "ENSG00000223609.11" "ENSG00000158869.11" "ENSG00000133742.14"
Compute PCA for each layer and store the results in the object. Its Important to make unique identifiers for each layer/run or the results will be overwritten and just saved as PCA. Here, the runPCA functions from the scater package is used.
set.seed(18)
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_a, exprs_values = "logcounts", name = "PCA_a")
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_g, exprs_values = "logcounts", name = "PCA_g")
redim_scae <- runPCA(redim_scae, ncomponents = 10, subset_row = top_ni_f, exprs_values = "logcounts", name = "PCA_f")
reducedDimNames(redim_scae)
## [1] "PCA_a" "PCA_g" "PCA_f"
The same goes for running t-SNE with the runTSNE function from the scater package. Unique identifiers are stated here for each layer as well.
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_a", perplexity = 50, name = "TSNE_a_50")
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_g", perplexity = 50, name = "TSNE_g_50")
set.seed(18)
redim_scae <- runTSNE(redim_scae, dimred= "PCA_f", perplexity = 50, name = "TSNE_f_50")
List of results from the performed reduced dimension analysis.
reducedDimNames(redim_scae)
## [1] "PCA_a" "PCA_g" "PCA_f" "TSNE_a_50" "TSNE_g_50" "TSNE_f_50"
Exemplary visualization for the t-SNE results on gene level for immune genes that relate to HLA-class I. In the given dataset, these are the immune genes HLA-A, HLA-B and HLA-C plotted alongside their alleles. This allows for insights into potential genetic differences shown on allele-level.
#determining which t-SNE (perplexity) to choose for all visualizations
which_tsne <- "TSNE_g_50"
#EGA_hla_a_alleles
tsne_g_a <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-A")
tsne_g_a1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "A*02:01:01:01")
tsne_g_a2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "A*26:01:01:01")
p2 <- tsne_g_a + ggtitle("HLA-A gene group") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["HLA-A",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y = element_text(size = 17), legend.box="vertical",
legend.text = element_text(size = 10), legend.margin=margin(l = 20), ) +
tsne_g_a1 + ggtitle("Allele A*02:01:01:01") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["A*02:01:01:01",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y=element_blank(), legend.position = "none") +
tsne_g_a2 + ggtitle("Allele A*26:01:01:01") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["A*26:01:01:01",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y=element_blank(), legend.position = "none") +
plot_layout(guide = "collect", nrow = 1, heights = c(1, 1, 1)) &
scale_colour_gradient2(name = "log-norm",
low = viridis(min(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-A"),])),
mid = viridis(max(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-A"),])/2),
high = viridis(max(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-A"),])),
na.value = "gray80") &
theme(legend.box="vertical", legend.margin=margin(l = 10),
axis.title.x = element_text(size = 17), axis.text.x = element_text(size =15),
axis.text.y = element_text(size =15), legend.text = element_text(size=12),
legend.title = element_text(size=17))
p2
tsne_g_b <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-B")
tsne_g_b1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "B*51:01:01:01")
tsne_g_b2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "B*57:01:01:01")
p3 <- tsne_g_b + ggtitle("HLA-B gene group") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["HLA-B",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y = element_text(size = 17), legend.box="vertical",
legend.text = element_text(size = 10), legend.margin=margin(l = 20), ) +
tsne_g_b1 + ggtitle("Allele B*51:01:01:01") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["B*51:01:01:01",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y=element_blank(), legend.position = "none") +
tsne_g_b2 + ggtitle("Allele B*57:01:01:01") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["B*57:01:01:01",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y=element_blank(), legend.position = "none") +
plot_layout(guide = "collect", nrow = 1, heights = c(1, 1, 1)) &
scale_colour_gradient2(name = "log-norm",
low = viridis(min(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-B"),])),
mid = viridis(max(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-B"),])/2),
high = viridis(max(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-B"),])),
na.value = "gray80") &
theme(legend.box="vertical", legend.margin=margin(l = 10),
axis.title.x = element_text(size = 17), axis.text.x = element_text(size =15),
axis.text.y = element_text(size =15), legend.text = element_text(size=12),
legend.title = element_text(size=17))
p3
which_tsne <- "TSNE_g_50"
#EGA_hla_c_alleles
tsne_g_g <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "HLA-C")
tsne_g_c1 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*04:01:01:01")
tsne_g_c2 <- plotReducedDim(redim_scae, dimred = which_tsne, colour_by = "C*06:02:01:01")
p1 <- tsne_g_g + ggtitle("HLA-C gene group") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["HLA-C",]),alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y = element_text(size = 17), legend.box="vertical",
legend.text = element_text(size = 10), legend.margin=margin(l = 20), ) +
tsne_g_c1 + ggtitle("Allele C*04:01:01:01") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["C*04:01:01:01",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y=element_blank(), legend.position = "none") +
tsne_g_c2 + ggtitle("Allele C*06:02:01:01") +
geom_point(position="jitter", size = 0.8,
aes(color = logcounts(redim_scae)["C*06:02:01:01",]), alpha = 0.8) +
theme_bw() +
theme(title = element_text(size = 17), plot.title = element_text(face = "bold"),
axis.title.y=element_blank(), legend.position = "none") +
plot_layout(guide = "collect", nrow = 1, heights = c(1, 1, 1)) &
scale_colour_gradient2(name = "log-norm",
low = viridis(min(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-C"),])),
mid = viridis(max(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-C"),])/2),
high = viridis(max(logcounts(redim_scae)[startsWith(rownames(logcounts(redim_scae)),
"HLA-C"),])),
na.value = "gray80") &
theme(legend.box="vertical", legend.margin=margin(l = 10),
axis.title.x = element_text(size = 17), axis.text.x = element_text(size =15),
axis.text.y = element_text(size =15), legend.text = element_text(size=12),
legend.title = element_text(size=17))
p1
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] viridis_0.6.4 viridisLite_0.4.2
## [3] cowplot_1.1.1 patchwork_1.1.3
## [5] lubridate_1.9.2 forcats_1.0.0
## [7] stringr_1.5.0 dplyr_1.1.2
## [9] purrr_1.0.2 readr_2.1.4
## [11] tidyr_1.3.0 tibble_3.2.1
## [13] tidyverse_2.0.0 gridExtra_2.3
## [15] scater_1.22.0 ggplot2_3.4.3
## [17] scran_1.22.1 scuttle_1.4.0
## [19] SingleCellAlleleExperiment_0.0.0.9000 SingleCellExperiment_1.16.0
## [21] SummarizedExperiment_1.24.0 Biobase_2.54.0
## [23] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1
## [25] IRanges_2.28.0 S4Vectors_0.32.4
## [27] BiocGenerics_0.40.0 MatrixGenerics_1.6.0
## [29] matrixStats_1.0.0 BiocStyle_2.22.0
##
## loaded via a namespace (and not attached):
## [1] Rtsne_0.16 ggbeeswarm_0.7.2
## [3] colorspace_2.1-0 bluster_1.4.0
## [5] XVector_0.34.0 BiocNeighbors_1.12.0
## [7] rstudioapi_0.15.0 farver_2.1.1
## [9] ggrepel_0.9.3 bit64_4.0.5
## [11] AnnotationDbi_1.56.2 fansi_1.0.4
## [13] xml2_1.3.5 R.methodsS3_1.8.2
## [15] sparseMatrixStats_1.6.0 cachem_1.0.8
## [17] knitr_1.43 jsonlite_1.8.7
## [19] cluster_2.1.4 dbplyr_2.3.3
## [21] R.oo_1.25.0 png_0.1-8
## [23] HDF5Array_1.22.1 BiocManager_1.30.22
## [25] compiler_4.1.2 httr_1.4.6
## [27] dqrng_0.3.0 Matrix_1.6-1
## [29] fastmap_1.1.1 limma_3.50.3
## [31] cli_3.6.1 BiocSingular_1.10.0
## [33] htmltools_0.5.6 prettyunits_1.1.1
## [35] tools_4.1.2 rsvd_1.0.5
## [37] igraph_1.5.1 gtable_0.3.3
## [39] glue_1.6.2 GenomeInfoDbData_1.2.7
## [41] rappdirs_0.3.3 Rcpp_1.0.11
## [43] jquerylib_0.1.4 rhdf5filters_1.6.0
## [45] vctrs_0.6.3 Biostrings_2.62.0
## [47] DelayedMatrixStats_1.16.0 xfun_0.40
## [49] beachmat_2.10.0 timechange_0.2.0
## [51] lifecycle_1.0.3 irlba_2.3.5.1
## [53] statmod_1.5.0 XML_3.99-0.14
## [55] org.Hs.eg.db_3.14.0 edgeR_3.36.0
## [57] zlibbioc_1.40.0 scales_1.2.1
## [59] hms_1.1.3 parallel_4.1.2
## [61] rhdf5_2.38.1 yaml_2.3.7
## [63] curl_5.0.2 memoise_2.0.1
## [65] sass_0.4.7 biomaRt_2.50.3
## [67] stringi_1.7.12 RSQLite_2.3.1
## [69] highr_0.10 ScaledMatrix_1.2.0
## [71] filelock_1.0.2 BiocParallel_1.28.3
## [73] rlang_1.1.1 pkgconfig_2.0.3
## [75] bitops_1.0-7 evaluate_0.21
## [77] lattice_0.21-8 Rhdf5lib_1.16.0
## [79] labeling_0.4.2 bit_4.0.5
## [81] tidyselect_1.2.0 magrittr_2.0.3
## [83] bookdown_0.35 R6_2.5.1
## [85] generics_0.1.3 metapod_1.2.0
## [87] DelayedArray_0.20.0 DBI_1.1.3
## [89] pillar_1.9.0 withr_2.5.0
## [91] KEGGREST_1.34.0 RCurl_1.98-1.12
## [93] crayon_1.5.2 DropletUtils_1.14.2
## [95] utf8_1.2.3 BiocFileCache_2.2.1
## [97] tzdb_0.4.0 rmarkdown_2.24
## [99] progress_1.2.2 locfit_1.5-9.8
## [101] grid_4.1.2 blob_1.2.4
## [103] digest_0.6.33 R.utils_2.12.2
## [105] munsell_0.5.0 beeswarm_0.4.0
## [107] vipor_0.4.5 bslib_0.5.1